Corpus-Driven Contextualized Categorization
نویسندگان
چکیده
Ontologies strive to offer a interconnected, hierarchical systems of categories to guide our actions in a complex world. But the boundaries of these categories are highly context-dependent, and what constitutes a prototypical category member in one context may be atypical or unrepresentative in another. In this paper we outline a dynamic, trainable, bottom-up view of category structure based on context-sensitive corpus analysis. By learning from corpora about how people creatively actually use categories in different contexts, we can train our ontologies to creatively adapt themselves to these
منابع مشابه
Concordance-Based Data-Driven Learning Activities and Learning English Phrasal Verbs in EFL Classrooms
In spite of the highly beneficial applications of corpus linguistics in language pedagogy, it has not found its way into mainstream EFL. The major reasons seem to be the teachers’ lack of training and the unavailability of resources, especially computers in language classes. Phrasal verbs have been shown to be a problematic area of learning English as a foreign language due to their semantic op...
متن کاملA Corpus-Driven Study of the Variation of Co-Occurrence Patterns in Written and Spoken Registers
This paper will focus on the study of the variation of co-occurrence patterns encountered in written and spoken registers, through the analysis of a large lexical database of corpus-extracted multiword expressions (MWEs) of European Portuguese. Those MWEs were automatically extracted from a balanced 50 million word written corpus and a 1 million word spoken corpus, furthermore statistically int...
متن کاملSense Contextualization in a Dependency-Based Compositional Distributional Model
Little attention has been paid to distributional compositional methods which employ syntactically structured vector models. As word vectors belonging to different syntactic categories have incompatible syntactic distributions, no trivial compositional operation can be applied to combine them into a new compositional vector. In this article, we generalize the method described by Erk and Padó (20...
متن کاملA Wikipedia-based Corpus for Contextualized Machine Translation
We describe a corpus for and experiments in target-contextualized machine translation (MT), in which we incorporate language models from target-language documents that are comparable in nature to the source documents. This corpus comprises (i) a set of curated English Wikipedia articles describing news events along with (ii) their comparable Spanish counterparts, (iii) a number of the Spanish s...
متن کاملTranscultural categorization in contextualized domains
Introduction. This study takes classifications of musical instruments from three different cultural regions to show that the model of knowledge organization in use is not appropriated for cultural integration. Method. The set of categories used for the analysed instruments have been taken from previous work of M. Kartomi and M. López-Huertas. Analysis. The selected categories have been processe...
متن کامل